空间变化的纳米光子神经网络

Spatially varying nanophotonic neural networks

Abstract

The explosive growth in computation and energy cost of artificial intelligence has spurred interest in alternative computing modalities to conventional electronic processors. Photonic processors, which use photons instead of electrons, promise optical neural networks with ultralow latency and power consumption. However, existing optical neural networks, limited by their designs, have not achieved the recognition accuracy of modern electronic neural networks. In this work, we bridge this gap by embedding parallelized optical computation into flat camera optics that perform neural network computations during capture, before recording on the sensor. We leverage large kernels and propose a spatially varying convolutional network learned through a low-dimensional reparameterization. We instantiate this network inside the camera lens with a nanophotonic array with angle-dependent responses. Combined with a lightweight electronic back-end of about 2K parameters, our reconfigurable nanophotonic neural network achieves 72.76% accuracy on CIFAR-10, surpassing AlexNet (72.64%), and advancing optical neural networks into the deep learning era.

人工智能的计算和能源成本的爆炸式增长激发了人们对传统电子处理器的替代计算模式的兴趣。光子处理器使用光子而不是电子，有望实现具有超低延迟和低功耗的光学神经网络。然而，现有的光学神经网络受其设计的限制，尚未达到现代电子神经网络的识别精度。在这项工作中，我们通过将并行光学计算嵌入到平面相机光学元件中来弥合这一差距，这些光学元件在捕获期间执行神经网络计算，然后再在传感器上记录。我们利用大型内核，并提出了一个通过低维重新参数化学习的空间变化卷积网络。我们在相机镜头内使用具有角度相关响应的纳米光子阵列来实例化这个网络。结合约 2K 参数的轻量级电子后端，我们的可重构纳米光子神经网络在 CIFAR-10 上实现了 72.76% 的准确率，超过了 AlexNet（72.64%），将光学神经网络带入了深度学习时代。

INTRODUCTION 介绍

Increasing demands for high-performance artificial intelligence (AI) in the last decade have levied immense pressure on computing architectures across domains, including robotics, transportation, personal devices, medical imaging and scientific imaging. Although electronic microprocessors have undergone drastic evolution over the past 50 years (1), providing us with general-purpose central processing units and custom accelerator platforms (e.g., graphical processing unit and Digital Signal Processor (DSP) ASICs), this growth rate is far outpaced by the explosive growth of AI models. Specifically, the Moore’s law delivers a doubling in transistor counts every 2 years (2), whereas deep neural networks (DNNs) (3), arguably the most influential algorithms in AI, have doubled in size every 6 months (4). However, the end of voltage scaling has made the power consumption, and not the number of transistors, the principal factor limiting further improvements in computing performance (5). Overcoming this limitation and radically reducing compute latency and power consumption could drive unprecedented applications from low-power edge computation in the camera, potentially enabling computation in thin eyeglasses or microrobots and reducing power consumption in data centers used for training of neural network architectures. 在过去十年中，对高性能人工智能（AI）的需求不断增长，这给机器人、运输、个人设备、医学成像和科学成像等各个领域的计算架构带来了巨大压力。尽管电子微处理器在过去 50 年中经历了巨大的发展（1），为我们提供了通用的中央处理器和自定义加速器平台（例如图形处理单元和数字信号处理器（DSP） ASIC），但这一增长率远远超过了 AI 模型的爆炸式增长。具体来说，摩尔定律的晶体管数量每 2 年翻一番（2），而深度神经网络（DNN）（3）可以说是 AI 领域最有影响力的算法，其大小每 6 个月翻一番（4）。然而，电压缩放的结束使功耗而不是晶体管数量成为限制计算性能进一步提高的主要因素（5）。克服这一限制并从根本上降低计算延迟和功耗，可以推动相机中的低功耗边缘计算带来前所未有的应用，有可能在薄眼镜或微型机器人中实现计算，并降低用于神经网络架构训练的数据中心的功耗。

Optical computing has been proposed as a potential avenue to alleviate several inherent limitations of digital electronics, e.g., compute speed, heat dissipation, and power, and could potentially boost computational throughput, processing speed, and energy efficiency by orders of magnitude (6–10). Such optical computers leverage several advantages of photonics to achieve high throughput, low latency, and low power consumption (11). These performance improvements are achieved by sacrificing reconfigurability. Thus, although general-purpose optical computing has yet to be practically realized due to obstacles such as larger physical footprints and inefficient optical switches (12, 13), several notable advances have already been made toward optical/photonic processors tailored specifically for AI (14, 15). Representative examples include optical computers that perform widely used signal processing operators (16–22), e.g., spatial/temporal differentiation, integration, and convolution with performance far beyond those of contemporary electronic processors. Most notably, optical neural networks (ONNs) (6, 23–38) can perform AI inference tasks such as image recognition when implemented as fully optical or hybrid opto-electronical computers. 光学计算已被提议作为一种潜在的途径来缓解数字电子学的几个固有限制，例如计算速度、散热和功率，并可能将计算吞吐量、处理速度和能源效率提高几个数量级（6–10）。这种光学计算机利用光子学的几个优势来实现高吞吐量、低延迟和低功耗（11）。这些性能改进是通过牺牲可重新配置性来实现的。因此，尽管由于较大的物理占用空间和低效的光开关等障碍，通用光计算尚未实际实现（12， 13），但专门为 AI 量身定制的光/光子处理器已经取得了一些显着进展（14， 15）。代表性示例包括执行广泛使用的信号处理运算符（16-22）的光学计算机，例如空间/时间微分、积分和卷积，其性能远远超过当代电子处理器。最值得注意的是，光学神经网络（ONN）（6， 23–38）在作为全光学或混合光电计算机实现时可以执行 AI 推理任务，例如图像识别。

Existing ONNs can be broadly classified into two categories based on either integrated photonics (24–30) [e.g., Mach-Zehnder interferometers (23, 26), phase change materials (24), microring resonators (29), multimode fibers (30)] for physically realizing multiply-adds floating point operations (FLOPs), or with free-space optics (6, 31–37) that implement convolutional layers with light propagation through diffractive elements [e.g., 3D-printed surfaces (6), 4F optical correlators (37), optical masks (35), and meta-surfaces (36)]. The design of these ONN architectures has been fundamentally restricted by the underlying network design, including the challenge of scaling to large numbers of neurons (within integrated photonic circuits) and the lack of scalable energy-efficient nonlinear optical operators. As a result, even the most successful ensemble ONNs (31) that use dozens of ONNs in parallel, have only achieved LeNet (39)–level accuracy on image classification, which was achieved by their electronic counterparts over 30 years ago. Moreover, most high-performance ONNs can only operate under coherent illumination, prohibiting the integration into the camera optics under natural lighting conditions. Although hybrid opto-electronic networks (35, 36, 40) working on incoherent light do exist, most of them do not yield favorable results as their optical front-end is designed for small-kernel spatially uniform convolutional layers, which this work finds does not fully exploit the design space available for optical convolution. 现有的 ONN 大致可分为两类，基于集成光子学（24-30） [例如，马赫-曾德尔干涉仪（23， 26）、相变材料（24）、微环谐振器（29）、多模光纤（30）] 用于物理实现乘加浮点运算（FLOP），或具有自由空间光学器件（6，31-37），它们实现了卷积层，光通过衍射元件传播 [例如，3D 打印表面（6）、4F 光学相关器（37）、光学掩模（35）和超表面（36）]。这些 ONN 架构的设计从根本上受到底层网络设计的限制，包括扩展到大量神经元（在集成光子电路内）的挑战以及缺乏可扩展的节能非线性光学运算符。因此，即使是最成功的集成 ONN （31）并行使用数十个 ONN，也只能达到 LeNet （39）级的图像分类精度，而这是 30 多年前电子同类产品实现的。此外，大多数高性能 ONN 只能在相干照明下工作，因此无法在自然光条件下集成到相机光学元件中。尽管确实存在处理非相干光的混合光电网络（35， 36， 40），但它们中的大多数并没有产生有利的结果，因为它们的光学前端是为小核空间均匀的卷积层设计的，这项工作发现这并没有充分利用可用于光学卷积的设计空间。

In this work, we report a novel nanophotonic neural network that lifts the aforementioned limitations, allowing us to close the gap to the first modern DNN architectures (41) with optical compute in a flat form factor of only 4 mm length, akin to performing computation on the sensor cover glass, in lieu of the bulky compound 4-f system–based Fourier filter setup (40). We leverage the ability of a lens system to perform large-kernel spatially varying (LKSV) convolutions tailored specifically for image recognition and semantic segmentation. These operations are performed during the capture before the sensor makes a measurement. We learn large kernels via low-dimensional reparameterization techniques, which circumvent spurious local extremum caused by direct optimization. To physically realize the ONN, we develop a differentiable spatially varying inverse design framework that solves for metasurfaces (42–46) that can produce the desired angle-dependent responses under spatially incoherent illumination. Because of the compact footprint and complementary metal-oxide semiconductor (CMOS) sensor compatibility, the resulting optical system is not only a photonic accelerator but also an ultracompact computational camera that directly operates on the ambient light from the environment before the analog to digital conversion. We find that this approach facilitates generalization and transfer learning to other tasks, such as semantic segmentation, reaching performance comparable to AlexNet (41) in 1000-category ImageNet (47) classification and PASCAL VOC (48) semantic segmentation. 在这项工作中，我们报道了一种新颖的纳米光子神经网络，它消除了上述限制，使我们能够缩小与第一个现代 DNN 架构（41）的差距，光学计算采用仅 4 mm 长度的扁平外形尺寸，类似于在传感器盖玻璃上执行计算，而不是笨重的基于化合物 4-f 系统的傅里叶滤波器设置（40).我们利用镜头系统的能力来执行专为图像识别和语义分割量身定制的大内核空间变化（LKSV）卷积。这些操作是在传感器进行测量之前在捕获期间执行的。我们通过低维重新参数化技术学习大内核，该技术规避了直接优化引起的伪局部极值。为了物理实现 ONN，我们开发了一个可微分空间变化逆向设计框架，该框架求解超表面（42-46），该超表面可以在空间非相干照明下产生所需的角度相关响应。由于紧凑的封装和互补的金属氧化物半导体（CMOS）传感器兼容性，最终的光学系统不仅是一个光子加速器，而且还是一个超紧凑的计算相机，在模数转换之前直接对来自环境的环境光进行操作。我们发现这种方法有助于泛化和将学习转移到其他任务，如语义分割，在1000个类别的ImageNet（47）分类和PASCAL VOC（48）语义分割中达到与AlexNet（41）相当的性能。

Recent work (49) concurrent to ours reported a novel metasurface doublet that implements a multichannel optical convolution via angular and polarization multiplexing under spatially incoherent illuminance, and extensions (50, 51) leverage large convolutional kernels for image classification and semantic segmentation. While this work shares advantages with ours, such as multichannel operation, high performance, and the use of incoherent light, our method uses a single metasurface and relies on LKSV convolution instead of uniform convolutions increasing the parameter space by an order of magnitude. 与我们同时进行的最新工作（49）报道了一种新的超表面双合体，它在空间非相干照度下通过角度和偏振多路复用实现多通道光学卷积，而扩展（50， 51）利用大型卷积核进行图像分类和语义分割。虽然这项工作与我们的工作具有共同的优势，例如多通道操作、高性能和非相干光的使用，但我们的方法使用单个超表面，并依赖于 LKSV 卷积而不是均匀卷积，将参数空间增加了一个数量级。

Hence, by on-chip integration of the flat-optics front-end (>99% FLOPs) with an extremely lightweight electronic back-end (<1% flops),="" we="" achieve="" higher="" classification="" performance="" than="" modern="" fully="" electronic="" classifiers="" [73.80%="" in="" simulation="" and="" 72.76%="" experiment,="" compared="" to="" 72.64%="" by="" alexnet="" ([*41*](https:="" www.science.org="" doi="" 10.1126="" sciadv.adp0391#core-r41))="" on="" cifar-10="" ([*52*](https:="" sciadv.adp0391#core-r52))="" test="" set]="" while="" simultaneously="" reducing="" the="" number="" of="" parameters="" four="" orders="" magnitude,="" thus="" bringing="" onns="" into="" deep="" learning="" era.="" 因此，通过将平面光学前端="" （="">99% FLOPs）与极轻的电子后端（<1% FLOPs）进行片上集成，我们实现了比现代全电子分类器更高的分类性能 [模拟中为 73.80%，实验中为 72.76%，而 CIFAR-10 上的 AlexNet （41）为 72.64% （52）测试集]，同时将电子参数的数量减少了四个数量级，从而将 ONN 带入了现代深度学习时代。

RESULTS 结果

LKSV parameterization LKSV 参数化

The working principle and optoelectronic implementation of the proposed spatially varying nanophotonic neural network (SVN3) are illustrated in Fig. 1A. The SVN3 is an optoelectronic neuromorphic computer that comprises a metalens array nanophotonic front-end and a lightweight electronic back-end (embedded in a low-cost microcontroller unit) for image classification or semantic segmentation. The metalens array front-end consists of 50 metalens elements that are made of 390-nm pitch nano-antennas and are optimized for incoherent light in a band around 525 nm. The wavefront modulation induced by each metalens can be represented by the optical convolution of the incident field and the point spread functions (PSFs) of the individual device. Therefore, the nanophotonic front-end performs parallel multichannel convolutions, at the speed of light, without any power consumption. We also refer to texts S1 and S3 for additional details on the physical forward model and the neural network design, respectively. 所提出的空间变化纳米光子神经网络（SVN3）的工作原理和光电实现如图 1A 所示。SVN3 是一种光电神经形态计算机，包括一个超透镜阵列纳米光子前端和一个用于图像分类或语义分割的轻量级电子后端（嵌入在低成本微控制器单元中）。超透镜阵列前端由 50 个超透镜元件组成，这些元件由 390 nm 间距纳米天线制成，并针对 525 nm 左右波段的非相干光进行了优化。每个超透镜诱导的波前调制可以用入射场的光学卷积和单个器件的点扩散函数（PSF）来表示。因此，纳米光子前端以光速执行并行多通道卷积，无需任何功耗。我们还参考了文本 S1 和

Spatially varying nanophotonic neural networks

INTRODUCTION 介绍

RESULTS 结果

LKSV parameterization LKSV 参数化

results matching ""

No results matching ""